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A single social phenomenon (such as crime, unemployment or birth rate) can be observed through 
temporal series corresponding to units at different levels (cities, regions, countries...). Units at a 
given local level may follow a collective trend imposed by external conditions, but also may display 
fluctuations of purely local origin. The local behavior is usually computed as the difference between 
the local data and a global average (e.g. a national average), a view point which can be very 
misleading. We propose here a method for separating the local dynamics from the global trend 
in a collection of correlated time series. We take an independent component analysis approach in 
which we do not assume a small unbiased local contribution in contrast with previously proposed 
methods. We first test our method on synthetic series generated by correlated random walkers. We 
then consider crime rate series (in the US and France) and the evolution of obesity rate in the US, 
which are two important examples of societal measures. For crime rates, the separation between 
global and local policies is a major subject of debate. For the US, we observe large fluctuations in 
the transition period of mid-70's during which crime rates increased significantly, whereas since the 
80's, the state crime rates are governed by external factors and the importance of local specificities 
being decreasing. In the case of obesity, our method shows that external factors dominate the 
evolution of obesity since 2000, and that different states can have different dynamical behavior even 
if their obesity prevalence is similar. 
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INTRODUCTION 

Large complex systems are composed of various inter- 
connected components. The measure of the behavior of a 
single component thus results from the superimposition 
of different factors acting at different levels. Common 
factors such as global trends or external socio-economical 
conditions obviously play a role but usually different sub- 
units (such as users in the Internet, states or regions in 
a country) will react in different ways and add their lo- 
cal dynamics to the collective pattern. For example, the 
number of downloads on a website depends on factors 
such as the time of the day but one can also observe 
fluctuations from a user to another one In the case 
of criminality, favorable socio-economical conditions will 
impose a global decreasing trend while local policies will 
affect the regional time series. In the case of flnancial 
series, the market imposes its own trend and some stocks 
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respond to it more or less dramatically. In all these cases 
it is important to be able to distinguish if the stocks or 
regions are at the source of their fluctuations or if on the 
opposite, they just follow the collective trend. 

Extracting local effects in a collection of time series is 
thus a crucial problem in assessing the efficiency of local 
policies and more generally, for the understanding of the 
causes of fluctuations. This problem is very general and 
as the availability of data is always increasing particularly 
in social sciences, it becomes always more important for 
the modeling and the understanding of these systems. 
There is obviously a huge literature on studying stochas- 
tic signals Q ranging from standard methods to more 
recents ones such as the detrended fluctuation analysis 
independent component analysis H, and separation 
of external and internal variables 0, 13|- Most of these 
methods treat the internal dynamics as a small local per- 
turbation with zero mean which is in contrast with the 
method proposed here. 

In a first part we present the method. In a second 
part, we test it on synthetic series generated by correlated 
random walkers. We then apply the method to empirical 
data of crime rates in the US and France, and obesity 
rates in the US, for which, to our knowledge, no general 
quantitative method is known to provide such separation 
between global and local trends. 
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MODEL AND METHOD 

In general, one has a set of time series {/i}i=i,...,Ar(t) 
where t = 1, . . . , T and we will assume that the number 
N of units is large. The index i refers to a particular unit 
on a specific scale such as a region, city, a country. The 
problem we address consists in extracting the collective 
trend and the effect of local contributions. One way to 
do so is to assume the signal fi{t) to be of the form 

m^fr'(t) + fr'it) a) 

where the 'external' part, /f^*(t). represents the impact 
on the region i of a global trend, while the 'internal' 
part, /*"*, represents the contribution due to purely lo- 
cal factors. Usually, in order to discuss the impact of 
local policies, one compares a regional (local) curve fi to 
the average (the national average in case of regions of a 
country) computed as 

r{t)^{i/N)Y,h (2) 

i 

(or = J2i''^ifi/ if oiic has intensive variables 

and populations n^). Although reasonable at first sight, 
this assumes that the local component is purely additive: 
fiit) = ./°^(^)+ local term. In this article, following [1,01, 
we will rather consider the possibility of having both mul- 
tiplicative and additive contributions. More specifically, 
we assume 

fr'{t)=a,w{t) (3) 

where w{t) is a collective trend common to all series, and 
which affects each region i with a corresponding prefactor 
Ui. These coefficients are assumed to depend weakly on 
the period considered, ie. to vary slowly with time. We 
thus write 

Mt)^a.w{t) + fr'it) (4) 

Wc first note that the global trend w is known up to 
a multiplicative factor only (one cannot distinguish aiW 
from {aiz){'w/ z) whatever z 7^ 0) and we will come back 
to this issue of scale later. Also, the purely additive case 
is recovered if the a^'s are independent of i. If on the con- 
trary the Ci's are different from one region to the other, 
the national average = f ~ (1/^) J2i fi^ ti^en 

given by 

J{t) = aw{t)+f^ (5) 

Here and in the following we denote the sample aver- 
age, that is the average over all units i, by a bar, and 
the temporal average by brackets (•). The 'naive' local 
contribution is then estimated by the difference with the 
national average 

/r*'"W - Mt)-7it) 

= (a,-a)«;(i)+/r*(t)"/™*(t) (6) 



The estimated local contribution j^*"''"^^) ^an thus be 
very different from the original one, //"*(i), and the 
difference I/*" '"(i) - /!"*(*)! will be very large at all 
times t where w{t) is large (note that the conclusion 
would be the same by taking the national average as 
f°''"{t) = ^iTiifi/ ^^rii). This demonstrates that com- 
paring local time series with the naive average could in 
general be very misleading. Beside the correct compu- 
tation of the external and internal contributions, the ex- 
istence of both multiplicative and additive local contri- 
butions implies that the effect of local policies must be 
analyzed by considering both how the local unit i follows 
the global trend (ci) and how evolves the purely internal 
contribution (/™*). 

In a previous study @, Menczcs and Barabasi pro- 
posed a simple method to separate the two contributions, 
internal (/"*) and external {ff^* written as aiw{t)). 
They assume that the temporal average (/™*) is zero, 
and compute the external and internal parts by writing 

a.^ ,^'^^'l^ <f.>/<7> (7) 

and fi^*{t) — aif{t). This method can be shown to be 
correct in very specific situations, such as the case where 
/; is the fluctuating number of random walkers at node i 
in a network, but in many cases however, one can expect 
that the local contributions have a non zero sample aver- 
age and the method of 0, 01 will yield incorrect results. 
Indeed, if the hypothesis Eq. ([4]) is exact, this method 
would give for w the estimate w{t) = aw{t) + /™*(i), 
and in the limit |w(t)| 00 for < — >■ 00 would lead to the 

estimates sa a^/a and /™* sa — aif'-"-''^ /a, which 
are different from the exact results, except if /'"* = 0. 

In order to separate the two contributions we propose 
in this article a totally different approach, by taking an 
independent component analysis point of view in which 
we do not assume that the local contribution has a zero 
average (over time and/or over the regions). To express 
the idea that the 'internal' contribution is by definition 
what is specifically independent of the global trend, and 
that the correlations between regions exist essentially 
only through their dependence in the global trend, we 
impose that the global trend is statistically independent 
from local fiuctuations 

wr*>c = o (8) 

(wc denote by < . >c the connected correlation {AB)c = 
(AB) — {A){B)), and that these local fluctuations are 
essentially independent from region to region, that is for 
i 7^ j 

{fr'fr')c « (9) 

where this statement will be made more precise below. 
We show that, for large N, these constraints ([5]), © are 
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sufficient to extract estimates of the global trend w and 
of the fli's. 

We denote by /i^, the average of w and by Uw its dis- 
persion, so that we write 



Indeed, 



(10) 



with {W) = and {W"^) = 1. If we denote by Fi{t) 
U{t) - (/,) and G, - fr' - ifr'). we have 



F,it) ^ A,W{t) + G,{t) 



with 



A, 



(11) 



(12) 



Note that (crf^*)^ = {{fi^^Y)c = Af. If we now consider 
the correlations between these centered quantities, Cij ~ 
{F,Fj), we find 



a, = A,A, + (G,G,) 



(13) 



If we assume that for i ^ j < GiGj > is negligible (of 
order 1 /N) compared to AiAj (which is what we mean by 
having small correlations between internal components, 
Eq. (O); from this last expression we can show that at 
the dominant order in TV, we have 



A,NA 



(14) 
(15) 



These equations lead to 



A,, 



1/2 



(16) 



^ 2 

which is valid when (G ) ^ A . We note that our 
method has a meaning only if strong correlations exist 
between the different /^'s and if it is not the case, the 
definition of a global trend makes no sense and the ap- 
proximation used in our calculations arc not valid. 

In the Supporting Information (section SIl) we show 
that the factors A^'s can also be computed as the com- 
ponents of the eigenvector corresponding to the largest 
eigenvalue of Cij - a method which is valid under the 
weaker assumption of having a small number (compared 
to N) of non diagonal terms of the matrix Dij = (GiGj) 
which are not negligible. 

Once the quantities Ai are known, we can compute 
the global normalized pattern W{t) with the reasonable 
estimator given by F/A, 



Wit) 



F 
A 



(17) 
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it) 
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A,. 



W{t) 



G 
A 



(18) 



and since the quantity G/A is a sum of independent vari- 
ables with zero mean, we can expect it to behave as 
1/\/N. We can show that this actually results from the 
initial assumptions. Indeed, by construction {G/A) = 
and the second moment is 



-y 



(19) 



By assumption we have {GiGj) « if i 7^ j and we thus 



obtain G/A - 1/V7V. 

The computation of the AiS and of W is equivalent 
to an independent component analysis (ICA) [5[ with a 
single source (the global trend) and a large number N 
of sensors. However, in contrast with the standard ICA, 
we are not interested in getting only the sources (here 
the trend W)^ but also the internal contributions (which, 
in a standard ICA framework, would be considered as 
noise terms, typically assumed to be small). We have 
already the A^'s, and since W{t) has been calculated we 
can compute Gi = Fi{t) — AiW{t). We thus obtain at 
this stage 



{f^^A,^ + {n 



(20) 



This is a set of N equations for TV -I- 1 unknown {^w/<^w 
and the {fi^*)'s) and we are thus left with one free pa- 
rameter, the ratio ^yj/a.^. Knowing its value would give 
the N local averages, the (/™*)'s. Less importantly one 
may want also to fix the average (hence both and 
CTm) in order to fully determine the pattern w{t): this 
will be of interest only for making a direct comparison 
between this pattern and the national average ([2]). This 
equation (PH]) suggests a statistical linear correlation be- 
tween {fi) and Ai, with a slope given by iXwjaw We 
will indeed observe a linear correlation in the data sets 
(next section, Figure 2). However, it could be that the 
(/*"*) 's themselves are correlated with the AiS. Hence, 
and unfortunately, a linear regression cannot be used to 
get an unbiased estimate of the parameter [iwl<^w In 
the absence of additional information or hypothesis this 
parameter remains arbitrary. However one may compare 
the qualitative results obtained for different choices of 
Muj/cu,: which properties are robust, and which ones are 
fragile. In particular one would like to be able to ac- 
cess how a given region is behaving, compared to an- 
other given region, and/or to the global trend. To do 
so, in the applications below we will in particular ana- 
lyze: (i) the correlations between the two local terms, 
Ai and (/"*); (ii) the robustness of the rank given by 
the (/*"*) 's; (iii) the sign of (/*"*); (iv) the quantitative 
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(t) and the naive 



and qualitative similarities between 
estimate f™*''^{t). 

We will focus on two particular scenarios. First, one 
may ask the global trend to fall 'right in the middle' of the 
A'^ series. There are different ways to quantify this. One 
way to do so is to note that, in the absence of internal 
contribution, fi/ai would be equal to w, hence {fi)/Ai 
would be equal to ^w/'^w Therefore we may compute 
liwiow by imposing 



Mu; ^ 1 (/») 



(21) 



0. An 



which is thus equivalent to impose ^ ^ ^ 
alternative is to ask the resulting /™* to be as close 
as possible to the naive ones (Eq. ([6])), by minimizing 
< - /r*'")' > which gives 



Mi; 



^2 



(22) 



In both cases one may then fix from [i^^ = (Z"") or by 
imposing w(to) = /""(^o) for some arbitrary chosen to. 
Finally, one may rather ask for a conservative compari- 
son with the naive approach by minimizing the difference 
between vj and either by writing \Xw = (or 

wih) = /""(to)) and cr„ = ((/'"')^)c, or by minimizing 
{{w — /°")^); which gives 



ti^^^iD and cj^^{wr) 



(23) 



For TV is large, one can check that the results depend 
weakly on any one of these reasonable choices. 

The second scenario considers the correlations between 
the (/™*)'s and the AiS. As we will see, the first hypoth- 
esis leads to a strictly negative correlation. An alterna- 
tive is thus to explore the consequences of assuming no 
correlations, hence asking for 



A (/»"*) - A (/™*) = 



(24) 



which implies that the slope of the observed linear cor- 
relation (/i) with Ai gives the value of iiwjow As ex- 
plained above, for each application below we will discuss 
the robustness of the results with respect to these choices 
of the parameter \iwl^w 

We can now summarize our method. It consists in (i) 
estimating the Ai% using Eq. (fT6|) (or using the eigen- 
vector corresponding to the largest eigenvalue of the cor- 
relation matrix, section SIl), (ii) computing W using 
Eq. (fT7|) . and finally (iii) comparing the results for dif- 
ferent hypothesis on ^w/o'w as discussed above. We pro- 
pose to call this method the External Trend and Internal 
Component Analysis (ETICA). We note that if the hy- 
pothesis Eq. (HI), ([5]), (m are correct, the method gives 
estimates of W, the AiS (hence of /™* — (/■"*)) which 



become exact in the limit t and A'' large, and a good es- 
timate of the full trend w (hence of the {fl"^*)) whenever 
this trend, qualitatively, does fall 'in the middle' of the 
time series. 

Once we have extracted with this method the local con- 
tribution /™*, and the collective pattern w{t) together 
with its redistribution factor for each local series, we 
can study different quantities, as illustrated below on dif- 
ferent applications of the method. In general, although 
this method gives a pattern 'w{t) very similar to the sam- 
ple average /(t), we will see that there is non trivial 
structure in the prefactors Oi's leading to non trivial local 
contributions /*"'(i). 

In some cases one may expect to have, in addition to 
the local contribution, a linear combination of several 
global trends (a small number of 'sources'): we leave for 
future work the extension of our method to several ex- 
ternal trends. 



APPLICATIONS: CORRELATED RANDOM 
WALKERS, CRIME RATES IN THE US AND 
FRANCE, OBESITY IN THE US. 

We first test our method on synthetic series and we 
then illustrate it on crime rate series (in the United States 
and in France) and on US obesity rate series. For the 
crime rates, a plot of the time series shows that obviously 
a common trend exists (Fig. 1). After computing the in- 




FIG. 1: Collective pattern. Crime rates for the US (upper 
panel) and France (lower panel) normalized by their time av- 
erage. The black thick line represent the collective pattern 
w{t) computed with our method. 

ternal and external terms, we perform different tests in 
order to assess the validity of the approach. In particular. 
Figure 2 shows a plot of the local factors AiS versus the 
data time-averages, the (/i)'s. One observes a statistical 
linear correlation in the four set of time series. We stress 
that the Ai 's are computed from the covariance matrix of 
the data, hence after removing the means from the time 
series. The fact that we do observe a linear correlation 
is thus a hint that our hypothesis on the data structure 
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FIG. 2: Existence of a linear correlation. We plot the pref- 
actors Ai versus the time average (fi) for the three different 
datasets. 
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FIG. 3: (A) Original signal composed of the superimposition 
of a sinusoidal trend and gaussian noises (for A'^ = 100 walk- 
ers). (B) Exact local contribution. (C) Local contribution 
extracted with our method. 



is reasonable (in contrast the very good linear correla- 
tion observed in 0] can be shown to be an artefact 
of the method used in these works, leading to an exact 
proportionality independently of the data structure, (see 
the section SI2). We now discuss in more detail the syn- 
thetic series, each one of the crime rate data sets, and 
the obesity rate. 



Synthetic series: correlated random walkers. 

We can illustrate our method on the case of correlated 
random walkers described by the equation 



(25) 



where F{t) is the global trend imposed to all walk- 
ers and the (,i{t) are gaussian noises but with possi- 
ble correlations between different walkers £,i{t)£,j{t) — 
[{N - M)5ij + a^M]/12 where a and M are tunable pa- 
rameters (see the supporting section SIS). For M = 0, 
the random noises £,i (t) are independent and our method 
is very accurate: we choose for example a sinusoidal trend 
F{t) = sin(wt) and we plot in the figure 3 the original 
signal, the exact local contribution and the local contri- 
bution computed with our method. When the correlation 
between walkers is increasing we study the Pearson corre- 
lation coefficient between the original local contribution 
and the estimate provided by our method, and we ob- 
serve that our method is indeed accurate as long as the 
correlations between the Gj's are not too large, which 
corresponds here to the condition a^M ^ 1. 



Crime rates in the US and France. 

In criminology an essential question concerns the im- 
pact of local policies, a subject of much debate [H, 14|. In 



order to assess these local effects (at the level of a state 
or a region), most authors consider the difference of a 
state evolution with the national average. As we noticed 
above this may lead to incorrect predictions. In this sec- 
ond part of applications, we thus illustrate our method 
on the analysis of the series of crime rates in 50 states 
in the US [ig for the period 1965 - 2005, and about 100 
departements of France jl6| for the period 1974 — 2007. 
On Fig. 1 we represent these time series normalized by 
their time average. The observed data collapse confirms 
the existence of a collective pattern (we also show on 
this plot the collective pattern w(t) obtained with our 
method). For the French case, we have withdrawn out- 
liers which do not satisfy our initial assumptions. The se- 
ries of these departements are indeed uncorrelated with 
the rest of crime rates and cannot be incorporated in 
the calculation of the collective pattern. We apply our 
method to these data and extract w(i), the A^'s and 
/"'*(i). As already mentioned, we plot on Fig. 2 the 
Ai's vs. the averages (fi), exhibiting a statistical linear 
correlation. We can check a posteriori that all conditions 
assumed in our calculation are fulfilled (zero (w/*"*) and 
small (GiGj), see SIl). Also, we checked that the coef- 
ficients ai do not vary too much the period considered, 
which is an important condition for our method (see the 
discussion on different datasets in the SI4). 

In order to assess quantitatively the importance of lo- 
cal versus external fluctuations, we study in particular 
the ratio of dispersions defined by 



-ext 



(26) 



where the external contribution is the standard deviation 
of f[^*{t) = aiw{t), that is af^* = Ai, and the internal 
one is given by {af'^f = ((/r*)^)c = (G?). Note that 
these quantities rji, being based on fluctuations, does not 
depend on fJ-w/cw ■ This quantity is found in both cases in 
France and in the US larger than one. This indicates that 
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external factors always dominate over local fluctuations, 
while local policies seem to play a minor role. In the case 
of crime, these external effects might be socio-economical 
factors such as unemployment, density, etc. 

In addition to compute the average of the rji's, we can 
also observe the time evolution of the heterogeneity de- 
fined by the sample variances of the different components. 
We first observe on Fig. 4 that large fluctuations are ob- 
served in the transition period of mid-70's during which 
crime rates increased significantly. We also observe for 



France 




FIG. 4: Comparison of internal and external fluctuations. 
On the left (right) column we present the results for the US 
(France). On the upper panels, we represent the total vari- 
ance of the signal, the external and the internal contribution. 
On the lower panels, we represent the external, internal, and 
the covariances normalized by the variance of the signal. We 
can observe that for the US, the external contribution is dom- 
inating since the 80's. 




FIG. 5: Determination of in the US crime rate case. We 
can use various conditions in order to determine a^: = 
Ei < /r' > /a., = < /r' > / E, a., or r = (r 
is defined in the text). We see in this plot that they all give 
very similar values. Lower panels: average fraction of time 
for which (/i"*) has the same sign as the naive calculation 
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the USA that until 1980, fluctuations were essentially 
governed by local effects but that this trend is inverted 
and increases in the period post-80's. In particular dur- 
ing the period 1980 — 2000 during which one observes a 
decline of crime rates [l3| , it is the collective trend which 
determines the fluctuations. 

Even we have presented results for reasonable choices 
of the parameter cr^ (in the following we make the harm- 
less choice fj,w = 1), one can ask the question of the 
robustness of different observed properties. First, we can 
compare the predictions for obtained for the different 
assumptions used in this paper. In the upper panels for 
Figs. 5 and 6 we show for the US (France), the quanti- 
ties {fr')/a^, Jff^/a and r = ((/™*)a - W^^)I<^1 
We see on these figures that these quantities are zero for 
values of aw which are very close. We also compute the 
fraction of time pi for which /™* (t) and the naive calcu- 
lation (/j) — (/f ") have different signs. We plot in the 
lower panels of Figs. 4 and 5, the quantity p = jj ^iPi 
showing a that for this range of a^, the signs of (/j™*) 
and (fi) — {fi"") are the same for about 60% of the time 
period. We can also study the sign (/™*) versus tTt„ 
and we can observe some robustness. In particular, in 



FIG. 6: As in figure 5, we can determine o-^ in the case of 
the crime rate in France, by using different conditions: = 

E, < /r* > M, or = Ei < ft' > I E, Am or r = 0. 
Here also, these conditions give very similar values of cr„. 
Lower panels: average fraction of time for which (/,*"*) has 
the same sign as the naive calculation (/i) — (/f). 

the US case, approximately 6 states (CA, NV, MO, MI, 
NY, AZ) have a positive local contribution (in the range 
Uw G [0.24, 0.32] while 6 states have always a negative 
local contribution (VT, GA, LA, NH, CT, MS). In these 
cases we can reasonably imagine that local policies have 
a noticeable effect. 

Finally, we can also analyze the ranking of the local 
contributions (/™*) versus by studying Kendall's r for 
the two consecutive series {(/i"*)}(<Tu,) and + 
buw)- In both cases (France and US) we observe a t 
larger than 0.9 for the range chose G [0, 0.5] (the con- 
trol case for a random permutation being less than 0.1) 
indicating a large robustness of the ranking. This means 
that independently of the assumption used to compute 
Ow we can rank the different regions according to the 
importance of their local contribution. 
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Obesity in the US. 

The prevalence of obesity (defined as a body max index 
- BMI, whieh is the ratio of the body mass to the square 
of the height - larger than 30kg /m^) is rapidly increasing 
in the world Q and reached epidemic proportion in the 
US and is now a major public health concern (ol. Iioj. 

Disparities by sex and between ethnic groups have been 



111, but few stud- 



observed in the prevalence of obesity 
ies focus on the effect of local factors and policies on 
the obesity rate. We thus apply our method to data 
from the CDC [l2j which describe the percentage of the 
population which is obese for each states in the US and 
for the period 1995-2008. As in the crime rate case, we 
can compare the variances for the internal and external 
contributions (see SIS) and we observe that the external 
contribution is dominating since the year 2000. This re- 
sult means that the global trend is the major cause of the 
evolution of obesity in different states. We can get more 
detailed information about the specific behavior of the 
states by studying the ratio r]i defined in Eq. (26) and 
the ratio of the fraction of the time average local contri- 
bution to the total signal y,; = We represent 
these two quantities in a plane (see figure 7) and we first 
note that for all states r/i > 1 which means that fluctu- 
ations are mainly governed by the global trend. We can 
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states Arizona, Georgia, and Oklahoma for example have 
very little local contribution and their variations is dom- 
inated by the global trend. In this respect, states such 
as DC, Indiana are very different from the first group. 
More generally, we can see on this figure that states with 
large prevalence display very different values of {yi^rji). 
This result points toward the fact that describing states 
by their prevalence only can be very misleading and can 
hide important dynamical behaviors. Finally, we also 
computed the quantities yi and r]i using the 'naive' local 
contribution using the national average (t) defined in 
Eq. ^ by (t) = fi{t) ~ /""(t). We represent in 

figure 8 the difference as vectors of components given by 
((/r* - fr^'")/{h),Vi - vr'""") and we can see on this 
figure that for roughly half of the states the naive calcu- 
lation of the local contribution can be very misleading. 




0.2 
, ^ int ^ int.n , , ^ 
(<f. >-<f. >)/<f.> 



FIG. 8: Difference with the naive fluctuations and local con- 
tribution. We represent for the different states the difference 
vectors ((/r' - /r''">/(/i>. ^« " vr'"") (for the sake of clar- 
ity, we indicated the name of the corresponding state for most 
vectors except for those close to (0,0)). For half of the state 
the difference between the naive calculation and our method 
is not negligible. 



FIG. 7: Fluctuations versus importance of the local contri- 
bution. We plot the quantity etUi versus yi = {fi"^)/{fi) for 
the different US states. We divide the states in three groups 
(circles: share less than 22%; squares: share in the interval 
[22%, 26%]; diamonds: share larger than 26%). Low preva- 
lence states seem to concentrate in the same region j/; ~ 0, 
while medium- and large-prevalence states display very dif- 
ferent values of rji and yi. 

also divide the states into two groups (with yi > and 
yi < 0). For large and positive yi, the states have a small 
Qi which means that these states are the less susceptible 
to the global trend, while in the opposite case, the states 
are governed by the global trend. Within each group we 
can then distinguish the states according to their level of 
fluctuations (rji close to or much larger than one). The 



DISCUSSION. 

In this article we adressed the crucial problem of ex- 
tracting the local components of a system governed by 
a global trend. In this case, comparing the local sig- 
nal to the average is very misleading and can lead to 
wrong conclusions. We applied this method to the ex- 
ample of crime rates series in the US and France and our 
analysis revealed surprising facts. The important result 
is about the importance of fluctuations which after the 
80 's in the US are governed by external factors. This 
result suggest that understanding the evolution of crime 
rates relies mostly on the identification of global socio- 
economical behavior and not on local effects such as state 
policies etc. In particular, this result could also help in 
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understanding the decreasing trend observed in the US 
and which so far remains a puzzle [H , 17 1 . In the case of 
obesity, we show that since the year 2000, external fac- 
tors dominate, and maybe more importantly that states 
with the same level of prevalence have very different dy- 
namical behaviors, thus calling for the need of a detailled 
study state by state. 

However one may expect an even better signal analysis 
by assuming that there are several independent external 
trends: it will be interesting to see if our approach, com- 
bined with the more standard ICA techniques, can be 
generalized to the case of several global trends (a small 
number of 'sources'). The recent availability of large 
amounts of data in social systems call for the need of 
tools able to analyze them and to extract meaningful in- 
formation and we hope that our present contribution will 
help in the understanding of these systems where the lo- 
cal dynamics is superimposed to collective trends. 
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SUPPORTING INFORMATION 1. 
DETERMINING THE A^'S BY USING 
EIGENVECTORS OF THE CORRELATION 
MATRIX 

The data correlation matrix Cij is known to provide 
useful information, in particular for the analysis of finan- 
cial time series [9, 10] or in other fields, e.g. in protein 
structure analysis [11]. The first, largest, eigenvalue is 
related to a global trend, and usually one is interested in 
the small number of intermediate eigenvalues: the asso- 
ciated eigenvectors give the relevant correlations in the 
data - e.g. allows to extract the sectors in financial time 
series. Here, making explicit use of our hypotheses, we 
extract from the first eigenvector of the correlation ma- 
trix the Ai factors which give how the global trend is 
amplified or reduced at the local level. 

We have 



(27) 



where Dij = (GiGj). If ■)/' is a normalized eigenvector 
(-0 • = 1) of C with eigenvalue A: C ■ ij} = AV', we have 

C ■i! = {A-i})A-{^D-i) (28) 

We can have A ■ ip which implies that ip is also eigenvector 
for D which in general is unlikely (there are no reasons 
that eigenvectors of D are orthogonal to A), li A - tp 
we then obtain 



X^A-A 



A-D-iP 
A-iP 



(29) 
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and 



/ A 



D ■ ij} 
X 



(30) 



For the largest eigenvalue, we will neglect at first order 
the second term of the rhs of this last equation, which 
leads to ?/; oc ^. Since "0 is normalized, we obtain 



< 10- 



o Crime France 
— Linear fit 



A 



(31) 



This approximation is justified if A • Z) • is small com- 
pared %o A - A and thus 



A-B- A 



[A^f 



< 1 



(32) 



Since A ■ A = 0{N), this approximation is justified if 
A ■ D ■ A is oi order N and not of order N'^. This is 
correct if D is diagonal (which means that the external 
components are not correlated (GiGj) oc Sij), but also if 
the number of non-zero terms of Dij is finite compared 
to N, or in other words if is a sparse matrix. 

We compared the values of Ai computed with the 
method exposed in the text and with the eigenvector 
method. Results are reported in the figures (9,10,11). 



o Crime us 
— Linear fit 



u. components 

FIG. 10: Comparison of tlie Ai computed with expressions in 
the text (Eq. 16) and with the components of the eigenvector 
corresponding to the large eigenvalue of dj in the case of 
crime rates in France. 
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FIG. 9: Comparison of the Ai computed with expressions in 
the text (Eq. 16) and with the components of the eigenvector 
corresponding to the large eigenvalue of dj in the case of 
crime rates in the US. 

We see that indeed for the crime rates in the US and in 
France, Dij is indeed negligible which demonstrate that 
the correlations of the internal contributions between dif- 
ferent states in the US are negligible. This is not the 
case for the stocks in the S'&PSOO where we can observe 
(small) discrepancies between the two methods, a result 
which supports the idea of sectors in the S'&P500. 



SUPPORTING INFORMATION 2: SCALING 

We show that the scaling crf^* ^< fi > observed by 
de Menezes and Barabasi in [6, 7] is actually built in the 



FIG. 11: Comparison of the Ai computed with expressions in 
the text (Eq. 16) and with the components of the eigenvector 
corresponding to the large eigenvalue of dj in the case of the 
S&PSOO. 



method proposed by these authors: it is a direct conse- 
quence of their definitions of the internal and external 
parts, and it does not depend on the data structure. 

Indeed, let fi{t),t = 1,...,T, i = 1,...,A^ be an arbi- 
trary data set such that < / >y^ 0. For i = 1,...,N, 
following [6] define Af^^ by 



A 



MB _ 



</.> 
</> 



and /f ^.-*(t) by 



/f^'-*(t)^Af^/(t) 



(33) 



(34) 



Then, from these definitions and without any hypothesis 
or constraint on the data other than < f >y^ 0, one has 



< ft''''''''' Af^ < m >=< U > (35) 



and 



(36) 
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Hence 



MB,ext\2 ( aMB\2 2 ^ f 'I 



<f>' 



with 



aj =< jitf >-< m >2 

Hence, one has always 

MB,ext _ f/ 



</> 



< fr > 



(37) 



(38) 



(39) 



The dispersion of the external component, if defined from 
([55)1 and ([M)l . is thus exactly proportionnal to the mean 
value of the local data. 



SUPPORTING INFORMATION 3. SYNTHETIC 
SERIES: CORRELATED RANDOM WALKERS 

We considered the case where the external trend is 



F{t) = sm{ut) 



(40) 



The gaussian noises are given by 

M N 

^,(i) = a^uf (i)+ (41) 



j=M+l 



where the uf\t) and u'^' {t) are independent, uniform 
random variable of zero mean and variance equal to 1/12. 
In this case, the correlation between different noises are 
governed by the parameters a and M 



— a^M N-M ^ 
?i?7 = H — — 0^. 



12 



12 



(42) 



When M — 0, the variables and arc independent (for 
i j) and we can monitor the correlations by increasing 
the value of M. Wc plot in figure 12, N — 100 random 
walkers in the usual uncorrelated case and in presence of 
correlations. 

In this simple case the exact result is given by w{t) = 
F{t), flj = 1, and f^"\t) = i,{t). The important 
condition for the validity of the method is given by 
AiAj ^ (GiGj) and is given here by 



1 > a^M 



(43) 



For M = 0, the random noises are independent and 
our method is very accurate as shown in the main text. 

More generally, in order to assess quantitatively the 
efficiency of the method, we compute the Pearson corre- 
lation coefficient between the exact /™*(t) and the esti- 
mate gi computed with the method. We plot in figure 13 
this coefficient versus a^M. This figure confirms the fact 
that our method is valid and very precise provided that 
the correlations between local contributions are not too 
large (here a^M < 4). 




FIG. 12: (A) iV Uncorrelated random walkers (iV = 100, 
a^AI = 0). (B) Random walkers with correlations (a^A/ = 
10). 
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FIG. 13: Pearson correlation coefficient between the exact 
local contribution and the local contribution computed with 
our method computed for different values of the correlation 
(A'^ = 100, results averaged over 100 realizations). 



SUPPORTING INFORMATION 4. 
DEPENDENCE OF THE a; ON THE TIME 
INTERVAL 

We can compute the quantities for the interval [Iq , t] 
and by letting t vary. We then obtain for the crime in 
the US (in the case of the crime rates in France, the 
dataset is not large enough) the figure 14(A). This figure 
shows that in the case of the crime rate in the US, the Oi 
converge to a stationary value, independent of the time 
interval, provided it is large enough. Our method will 
then lead to reliable results constant in time. 

Wc also tested our method on the financial time series 
given by the 500 most important stocks in the US econ- 
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FIG. 14: (A) Coefficients computed in the case of US crime 
for the interval [1960, t] with varying t (in years). (B) Coef- 
ficients Tji computed for the SP500 in the interval [0, 125 + 1] 
{t is in days in this case). 

omy [1], and which composition loads to the S'&P 500 
index. Here the 'local' units are the individual stocks 
[i = 1,...,N = 500), and the (naive) average - analogue 
to a national average - is precisely the SSzP 500 index 
time serie. We study the time series for these stocks on 
the 252 days of the period 10/2007 - 10/2008 and we 
compute the global pattern w(t), the coefficients a;, and 
the parameters 77^ (defined in the text) computed for the 
time window [10/2007, for t varying from 04/2008 to 
09/2009. These quantities rji measure quantitatively the 
importance of local versus external fluctuations for the 
stock i. The results for the 77.; 's are shown in figure 6(B) 
and display large variations, particularly when we ap- 
proach October. 2008, a period of financial crisis. It is 



therefore not completely surprising that the rji (and the 
Oi's) in this case fluctuate a lot. In some sense, we can 
conclude that the a^'s correspond to an average suscep- 
tibility to the global trend, are not invariable quantities 
and can vary for different periods. We thus see on this 
example, that it is important to check the stability of 
the coefficients Oi which is an crucial assumption in our 
method. The variations of these coefficients is however 
interesting and further studies are needed in order to un- 
derstand these variations. 

[1] Historical Data for SSzP 500 stocks 
: //biz . swcp . com/ stocks/ 
SUPPORTING INFORMATION 5. OBESITY IN 
THE US: VARIANCES FOR THE EXTERNAL 
AND INTERNAL CONTRIBUTION 

For the obesity rate series, we compare the variances 
of the internal (/™*) and the external (oiw) contribu- 
tions. We observe on the figure 15 that the variance of 
the external contribution became dominant after the year 
sa 2000. 




year 

FIG. 15: Comparison of internal and external fiuctuations 
for the obesity in the US. We represent the total variance of 
the signal (/), the external (uiw) and the internal contribu- 
tion {fi"^). We observe that for the external contribution is 
dominating since the year 2000. 
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